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A METHOD OF TRAJECTORY OPTIMIZATION BY FAST -TIME 


REPETITIVE COMPUTATIONS 

By Rodney C. Wingrove, James S. Raby, 
and D. Francis Crane 
Ames Research Center 


SUMMARY 


This report presents a perturbation method of computing optimum 
trajectories wherein control impulse response functions are determined by 
fast-time repetitive computations of the state equations. This method does 
not require the solution of the auxiliary set of adjoint equations used with 
other perturbation methods. 

The mechanization of this computing method on a hybrid computer is dis- 
cussed and an application to the steepest descent optimization of reentry 
trajectories is presented. In this example, the vehicle is to arrive at a 
desired range, the heat input to the vehicle is to be minimized, and the con- 
trol is to remain within specified constraints. Optimum trajectories for 
this example could be obtained in about 8 minutes of computing time. 


INTRODUCTION 


It is important for space vehicle trajectories to be near optimum in the 
sense that some quantity is either maximized or minimized. For example, in 
reentry the trajectory to desired terminal conditions is near optimum when 
the total aerodynamic heating is a minimum. This report will consider a 
method for finding the time histories of nonlinear controls that correspond 
to optimum trajectories. Several perturbation methods, such as the calculus 
of variations, applications of the maximum principle, and direct steepest 
descent, have been considered for solving this control optimization problem. 
Reference 1 contains a good review of these various methods and reference 2 
gives several analog and digital computing techniques for implementing them. 

In principle, each of these techniques should give satisfactory results, 
but it has been found that for many trajectory problems the computer mechani- 
zations are cumbersome and require programs that are difficult for engineers 
to formulate. The computing method to be reported herein was investigated in 
an attempt to alleviate these difficulties and to provide a more direct way 
of computing optimization solutions. 

In previous optimization studies using perturbation techniques the 
computations have involved the dynamic solution of two sets of equations: 

(l) nonlinear state equations and (2) linear adjoint equations. The method 
to be reported herein differs in that only the solution of the nonlinear state 



equations is used. The response of given functions (e.g., terminal error or 
quantity to be optimized) to a control impulse is determined along the trajec- 
tory by fast -time repetitive computations rather than by a solution of the 
adjoint equations. Since auxiliary adjoint equations are not needed, the 
investigator should understand the optimization process more easily; also the 
computer program should be simpler. However, this new alternate computing 
method does require many solutions of the state equations. This task of com- 
puting a large number of dynamic solutions is ideally suited to high-speed 
repetitive hybrid computation as will be considered herein. 

This report will present one application of this computing technique; that 
of trajectory optimization using the steepest descent method (refs. 3 and. 4). 
The mechanization of this method on a hybrid computer will be discussed and 
results will be presented to illustrate the use of this procedure in the 
optimization of reentry trajectories. For the interested reader appendix A 
illustrates the relationship of the impulse response functions computed in this 
report to the solutions obtained with the adjoint equations and to the maximum 
principle of optimization. This appendix also provides a background for under- 
standing the steepest descent optimization equations. 


NOTATION 


The following notation is used in the body of the text. Additional 
symbols are described as they are introduced in the appendixes. 

- control value of lift -drag ratio 

n number of storage points in control time history 

t time 

tf final time 

t Q initial time 

At time increment of control impulse 

u control function 

Au control impulse 

cp cost function at final time 

Ar(t) change in cost function at final time due to control impulses at 
time t 

\[/ state value at final time 
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\|r^ desired state value at final time 

A\|r(t) change in state value at final time due to control impulses at time t 


METHOD 


The method of steepest descent (refs, 3 and 4) is an iterative procedure 
that has been used for optimizing trajectories. The process commences with 
any nonoptimal trajectory from which a slightly improved one is derived. The 
improved trajectory is then used as a new nominal trajectory, and the procedure 
is repeated until the optimum or nearly optimum trajectory is found. 


General Outline 

The iteration is as follows: (l) Estimate a reasonable program that 

nearly satisfies the terminal conditions for specified initial conditions; 

(2) determine impulse response functions that describe the effects of small 
changes in the control on the terminal state and on the cost (the quantity to 
be minimized). These impulse response functions, combined with steepest 
descent computations, indicate the best possible way of making small changes 
to the control to decrease the cost and still arrive at the end point; ( 3 ) add 
this change in control to the previous nominal control program. The result is 
a new trajectory with a decreased cost; (4) repeat this process until there 
exists only a very small change in cost for each new trajectory, indicating 
that the control is very near a local optimum. A limit value of the control 
may be reached before the cost is completely minimized. In this case, the 
process is continued until the constraint (control limit) is reached, since 
no further optimization is possible. 

The properties of steepest descent optimization have been documented in 
many previous studies (e.g., refs. 5-7)* Although this method has been 
regarded as the most practical in many applications, there is no guarantee 
that it yields the absolute optimum. That is, for some initial choices of the 
nominal trajectory, the final optimized trajectory may represent only a local 
optimum path. Also, in some applications, where the cost function may be 
relatively insensitive to control variations, a large number of iterations may 
be necessary to approach the optimum solution. 


Computation of Impulse Response Functions 

To illustrate the computation of the impulse response functions let the 
quantity to be minimized be noted as cp, the cost evaluated at the final time. 
Let the state variable at the final time be noted as \|r and let the desired 
end-point value for this be denoted i|f^. 

Figure 1 illustrates the manner in which the influence of small control 
changes on cp and \|r are calculated in this report. The equations of motion 
are first solved with a control change, a positive control impulse at time t, 
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superimposed upon the nominal control. During the next solution of the motion 
equations, a negative control impulse of the same magnitude is inserted at time 
t. The change in cost, Acp, and the change in terminal state, A\|/, are derived 
from these two solutions. In a similar manner the impulse response functions 
can be progressively determined at successive times along the trajectory, and 
the technique by which they are determined is the most important feature of 
this computing method. The computation of the full history of Ap(t) and A\|/(t) 
for the same control impulse at different times along the trajectory is termed 
one "iteration” since it corresponds to the previous optimization studies where 
one iteration with the adjoint equations was used to compute essentially these 
same functions along the trajectory. Appendix A shows the relationship of this 
experimental determination to the standard determination using adjoint equa- 
tions. These experimental impulse response functions are shown to be directly 
related to the well-known "Green 1 s functions." 


Steepest Descent Optimization Equations 


The impulse response functions are used in the steepest descent technique 
to modify the control toward the optimum in the following manner (see appen- 
dix A) . 


( New \ / PreviousX 

nominal U nominal 
control J V control J 


+ K<p Ap(t) + At(t) 


( 1 ) 


The gains K<p and are constants for each iteration. The gain K<p weights 
the impulse response function for the cost; its sign is negative in order to 
decrease the cost. The magnitude of K^p is determined experimentally for each 
problem. Too large a gain may cause instability in the convergence procedure, 
while too small a gain may extend the time of convergence. 


The gain must be calculated for each iteration so that the term 

At(t) will account for terminal displacement due to the optimizing term, 

K<p Ap(t), and any terminal displacement error from the previous iteration will 
be corrected. The formulation for calculating is as follows: 

Small changes, 5\|r, in the terminal state, due to small changes, 6u(t), 
in control can be approximated by: 


Bf 


1 

2 Au At 



5u( t)A^( t) dt 


to 


( 2 ) 


where Au is the height of each control impulse and At is the time interval 
of each control impulse. Substituting Ap(t) + A\|r(t) from (l) for Su, 

we have : 


r 


1 

2 Au At 



[ Kq, Ap(t)At(t) + At 2 (t)]dt 


( 3 ) 


k 


ip- 


Solving for and letting -5^ = td “ ^ (previous terminal error) we obtain: 

*d - f 


% = -Kcp 


tf 

J Aip ( t ) Aijf ( t ) dt 
to 


J f A\|r 2 (t) dt 


+ 2 Au At 


J* ^ A\]/ 2 (t)dt 


(b) 


Steepest descent 
optimization term 


Terminal error 
correction term 


This gives the general form of the steepest descent equations. The actual cal- 
culations are next considered in more detail. 


HYBRID COMPUTER MECHANIZATION 


The mechanization of the optimization procedure on a high-speed repetitive 
analog computer is presented in figure 2. Figure 2(a) is the flow diagram and 
figure 2(h) illustrates the logic used in automatically regulating the problem. 
The mechanization consists of an analog computer program for solving the tra- 
jectory equations; logic required to coordinate the procedure; and a serial 
memory storage unit for storing the nominal control program. 

The serial memory unit is continuously driven by counter pulses (Logic 
no. l) . The output of the serial memory is the nominal control time history 
with n points that is used along with the appropriate control impulse, to 
solve the trajectory equations. These equations are started at the specified 
initial conditions with Logic no. 2 and stopped when the trajectory reaches 
the specified end point with Logic no. 3- The final values of the cost quan- 
tity, cp, and state quantity, \| /, are stored at the end of each run as indicated 
by Logic nos. 4 and 5« The positive or negative control impulse is added to 
the nominal control input with Logic nos. 6 and respectively. Logic no. 8 
inserts the modifying control (K<p A\|r) into the serial memory. This 

procedure runs in essentially a continuous manner; that is, one point out of 
the n points in the nominal control history is updated after each two repet- 
itive computations, and after 2n repetitive computations (one iteration), 
every point in storage has been modified and the process is repeated. For each 
iteration the gains K<p and are held as constants. As previously men- 

tioned, the value of Kq) determines the relative speed and stability of the 
convergence onto the optimum. The corresponding value of to be used with 

each new iteration is calculated by equation (4) as a function of the terminal 
error from each previous iteration (^ - \|r) and as a function of the following 
two integrated quantities from each previous iteration: 

tf 

J Acp(t)A\|r(t)dt (5) 

t c 
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and 


I* ^ A\|r 2 (t)dt (6) 

to 

The values for equations (5) and (6) were computed as integrals over the 
time period from t = t Q to t = tf . The time t D was represented by a logic 
signal at the first repetitive computation in an iteration cycle and the time 
tf was represented by a logic signal at the last computation in an iteration 
cycle. It should be noted that during those parts of the trajectory when the 
control was at a constraint limit, no further optimization was possible, and the 
integration of equations (5) and (6) was therefore not carried out during those 
times . 

This type of computer mechanization will be illustrated in more detail for 
the following example problem. 


APPLICATION TO REENTRY TRAJECTORY OPTIMIZATION 


Statement of the Problem 

The problem to be illustrated in this section is that of determining the 
time history of the variation of lift-drag ratio (control L/D) that must be 
flown for a vehicle returning into the earth's atmosphere so that: 

The total heating load to the vehicle is minimized. 

The vehicle arrives at a desired destination. 

The control remains within specified constraints. 


Mechanization 


The equations of motion, presented in appendix. B, were for a point mass in 
planar motion over a spherical nonrotating earth. The vehicle characteristics 
and flight conditions were those for a manned capsule returning from earth 
orbit. 


Initial conditions were : 


Altitude 

Velocity 

Flight -path angle 
Range to destination 


76.3 km 
7.63 km/s 
-1.8° 

1609 km 


(250,000 ft) 
(25,000 fps) 

(1000 mi) 


Final stopping conditions were: 
Altitude 


30 . 48 km (100,000 ft) 
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Control limits were : 


o < | < 0.5 


The main hardware elements used in the hybrid computer mechanization were: 

Hardwar e element s Program task 

Analog computer Solution of trajectory equations 

Parallel digital logic units Logic control of program 

Track and store amplifiers Storage of end-point values 

Digital delay line memories 1 Storage of control time history 

(with D/A and A/D converters) 

The 64-word digital serial memory unit (13 hits per word) was accessed 
with the fastest allowable counter rate (0.002 sec). A complete 64-word cycle 
was then available every 0.128 second. To allow a complete solution of the 
trajectory equations within 0.128 second, the analog computer was time scaled 
at 3750 to 1. 


Results 

A series of computer runs for this problem is illustrated in figure 3- 
Figure 3(a) presents the details of each repetitive trajectory computation and 
figure 3(b) presents the details of the overall convergence onto the optimum 
nominal control. Figure 3(a) shows just a portion of iteration no. 0 as pre- 
sented in figure 3(b) . 

In the upper trace of figure 3(a) the control impulses are superimposed 
upon the initial nominal control. Each control impulse had a magnitude of 
L/D = ±0.25 and a time increment of one clock pulse (0.002 sec). This control 
impulse was chosen because it gave variation in the final range and heat load 
on the order of ±5 percent. The range and integrated heat load along each of 
the repetitive trajectories are presented along with the final values as they 
are stored with track and store amplifiers. The difference between these 
stored quantities for each two pairs of subsequent runs is At representing 
the range impulse response functions and Zkp representing the heat load 
impulse response functions. 

In figure 3(b) the first 10 iterations (each iteration consists of 128 
repetitive computations) of the converging optimization procedure are illus- 
trated along with the final iteration. 

During the convergence procedure the range is seen to vary slightly about 
the desired value of 1609 km (1000 miles). The heat load is shown to be 
reduced about 10 percent during the first ten iterations and diminished to 
about 12 percent from the original with the final (optimum) control variation. 

X A series of track and store amplifiers could also have been used for 
this storage. 
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The modifying control shown in the figure is the sum K^> /kp + Av|r. For 
this series of runs a constant value of = -2. 5x10 ~ 7 [ units of (L/D)/(j/m 2 ) ] 
was found to^ allow a fairly rapid convergence while maintaining program stabil- 
ity. The value of was calculated by equation (4) to be that value for 

each iteration such as' to allow convergence in the steepest descent manner. 

In the lower trace of figure 3(t>) the nominal control is recorded as it is 
read out of serial memory every 128 +1 counter pulses (with Logic no. 8). This 
gives a convenient time history to show the manner in which the control has 
been modified during each iteration. The control is seen to be limited within 
0 < L/D <0-5* This was achieved simply by limiting the output of the serial 
memory to within these values. 

As can be seen, the optimum control variation for this case is a bang- 
bang control. With the steepest descent method, it is usually found that near- 
optimum control can be achieved in the first few iterations, but that to 
"square up the corners" and achieve full optimum control a number of further 
iterations (on the order of 20 to 50 ) are required. 


Convergence and Stability Considerations 

One of the Important aspects of any optimization scheme is the ability to 
converge, within a reasonable time, onto the optimum solution. For the partic- 
ular method in this report it has been pointed out that this convergence pri- 
marily depends upon choosing the proper value of the gain K^. In the example 
problem, it was found that using any value of | ( less than 2. 5x10 “ 7 
[units of (L/D) /(j/m 2 ) ] resulted in smooth convergence; however, the convergence 
time (which was proportional to l/K^) became long. For initial values of 
greater than twice the aforementioned value the convergence became unstable, 
that is, the modifying 5 control became so large as to change drastically 
the state variable from their nominal final values. 

It was found that, as the optimum control was approached (after about 10 
iterations), the value of could be increased and convergence stability 

maintained, because in these examples the control approached bang -bang and only 
small changes were possible near the saturation limits. The value of in 

these cases could be increased to about 10 times the aforementioned value, but 
increasing it much farther (without analog voltage scaling changes) would allow 
extraneous computer noise to be magnified to a point where it caused notable 
random fluctuations in the computations . 

For a reasonable value of gain, such as that used for the example problem, 
the time to converge to a near optimum solution (ll iterations) was about 3 
minutes, and to a full optimum solution (30 iterations), about 8 minutes. 
Further changes in these convergence times, of course, depend upon several fac- 
tors. For instance, the convergence time in this computing setup was in pro- 
portion to n 2 , where n is the number of points describing the control time 
history (64 points for the case cited). Also the allowable solution rates of 
the computer elements directly affect the convergence time. The continuing 
development and use of high-speed computing elements will certainly result in 
convergence times smaller than the time cited. 
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The results obtained by this computing method appear satisfactory for 
engineering purposes; however, the usual disadvantages of analog computation 
are inherent with this method. These disadvantages are primarily concerned 
with the extraneous noise in the computations and the absolute accuracy (only 
to within about 1 percent) of analog computer. 


CONCLUDING REMARKS 


This report has presented a perturbation method of computing optimum 
trajectories. The technique uses fast -time repetitive computations in deter- 
mining control impulse response functions and requires only the dynamic solu- 
tion of the state equations; whereas other perturbation computing techniques 
have required the solution of additional adjoint equations, 

A hybrid computer was used in applying the method to the steepest descent 
optimization of reentry trajectories. Mechanizing the computer for this type 
of problem was relatively simple, and near optimum trajectories could be 
obtained in about 3 minutes of computing time and full optimum trajectories in 
about 8 minutes. 

The advantage of the technique outlined here over alternative techniques 
is that the investigator need not be familiar with or use an auxiliary set of 
linear adjoint equations for the optimization. This technique does, however, 
require a large number of dynamic solutions of the state equations, but this 
computing task appears practical with the high-speed repetitive computation 
procedure presented in this report. 


Ames Research Center 

National Aeronautics and Space Administration 
Moffett Field, Calif., Jan. 2k, 1 966 
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APPENDIX A 


RELATIONSHIP OF EXPERIMENTAL IMPULSE RESPONSE FUNCTIONS 
TO ADJOINT SOLUTIONS AND THE MAXIMUM PRINCIPLE 


Adjoint Solution of Impulse Response Functions 


Let the state equations he noted as 

x = f[x(t), u(t)] (Al) 

where x(t) is a vector of state variables, u(t) is the control variable and f 
is a vector of known functions of x(t) and u(t). 


The auxiliary adjoint equations can be noted as 


A = - 



(A2) 


T 

where A is a vector of influence functions and (df/cht) is the transpose 
of the matrix describing linear motions about the nominal path of x(t) and 
u(t) . 

It is well known that any small change in the control quantity 5u(t) 
along the nominal path will determine a change Sep in any quantity cp at the 
final time as follows: 


Sep = 



(A3) 


where Aq, represents a solution of the adjoint equations with the boundary 
conditions at the final time of 


a T\ _ (%\ 


'T 


A=t f 


w t= 


(A4) 


t=tf 


m 

The quantity A (df/du) within the integral is known as Green's function. 


10 



Experimental Determination of Impulse Response Functions 


The method used in the test of this report for finding a change Sep is to 
perturb the control experimentally in the following manner: 




^-At 

T 


^Control impulse 

Au 



* 


^ Nominal control 


to » f t 

With two sequential dynamic solutions of the state equations using first a 
positive control impulse and then a negative control impulse, the following is 
avai lah le : 

26cp = 9(+iinpulse) “ ^(-impulse) (-A-5) 


From equation (A3) we can write the change 2Scp = Ap , for a small control, 
Su = Au, acting over a small time interval At, as follows: 


2 Sep - Ap 



(A6) 


This then represents the correspondence between the impulse response 
functions calculated in the text and those solved by the adjoint solution. 
Green’s function evaluated at any time, t, along the trajectory can be noted 
as 


T df _ Ap(t) 

^ 2 Au. At 


and 


(at) 


< 


df At(t) 

* Su " 2 Au At 


(AS) 


e 


Relationship to the Maximum Principle 


The maximum principle (ref. 8) states that a necessary condition for a 
minimum (maximum) of the cost function is that the Hamiltonian be maximized 
(minimized) with respect to the control at all times. The Hamiltonian can be 
written as 

H = Af (A9) 
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where the transversality condition must he satisfied at the final time. 


T 

(a f) 


t=t. 


St 


n *£\ 
n at/ 


(A10) 


t-tf 


and r] is a Lagrange multiplier constant chosen so that the terminal con- 
straint is satisfied. The boundary conditions on the adjoint equations at the 
final time are: 




dx A= tf 


(All) 


Now to determine if H is minimized with respect to the control we can 
take the partial derivative of H with respect to u: 


3h _ t af 

an = A Su 


(A12) 


Or, noting the correspondence between equations (a 4) and (All) , we can write 


aH T df . T df 

chi ^ du + ^ ^ Su 

where ?w(Sf/Su) is Green 1 s function for the cost and 
function for the terminal constraint. 


(A13) 

T 

A^(Sf/Su) is Green 1 s 


Recalling the correspondence between the adjoint solution for Green* s 
function and that determined experimentally, we have the following: 


dH _ Acp(t) Ai|f(t) 

Su ” 2 Au At + 71 2 Au At 


(A110 


This, then, represents the relationship between the experimentally 
determined impulse response functions and the Hamiltonian. It is interesting 
that the maximum principle can be applied through this relationship without any 
need for solving the adjoint equations. 


Steepest Descent Equations 


Of 

The greatest change, 59 , in cp for a given value of 1 5 u 2 (t)dt is 


obtained (ref. 3 ) when 


Su(t) = K^p Ap(t) + At(t) 


(A15) 
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where IQp and K^. are constants. This is the steepest descent (or ascent) 
direction to the minimum (or maximum) cp. 

When there are no state or control constraints, the steepest descent 
procedure converges toward the necessary conditions for an optimum solution as 
previously noted 

Ap(t) + r) At(t) = 0 (A1 6 ) 

where in the steepest descent equations, K^/Kcp = rj, on the optimum solution 
with the terminal constraint satisfied. 
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APPENDIX B 


REENTRY TRAJECTORY EQUATIONS 


The folio-wing equations -were programmed on the analog computer for the 
example considered in this report. These simplified equations were derived for 
flight within the atmosphere and the primary assumptions include: a spherical 
nonrotating earth, small flight -path angles, and a constant gravity term. The 
derivation of these simplified equations and their applicability have been 
considered in a number of reports. See for instance reference 9- 


The equations are 



V = 




Y = 



V dt 


9 


3-75x10 



t 


o 


where 


h 

V 

9 

P 

r 



control value of lift -drag ratio 
altitude, m 

horizontal velocity, m/s 

final range, m 

total heat input, j/m 2 

atmosphere density, 1.225 e ^/ 71QO 

radius from earth center, 6.43X10 6 m 

local gravitational acceleration, 9*8 m/s 2 

drag loading, 0.004 m 2 /kg 
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